187-2007: SAS® Macros for Assessing the Symmetry of a Data Set
نویسندگان
چکیده
Skewness indicates a lack of symmetry in a distribution. The coefficient of skewness is the commonly used measure to identify a lack of symmetry in the underlying data, although graphical procedures can also be effective. We have developed two comprehensive and efficient SAS ® macros for computing the various skewness measures and the appropriate power transformation, if one exists, to make an asymmetric distribution symmetric. We present two SAS ® macros, %Symm and %Symmchk, which primarily utilize the UNIVARIATE procedure in SAS ® and a few other built-in macro calls. Our SAS ® macros provide a menu of options to perform all computations efficiently and suggest an appropriate power transformation, if one exists, to make the data symmetric. This latter part is noteworthy in that we not only address the problem i.e., identify whether a given data set has an underlying symmetric distribution, but also provide a solution. Real data from clinical studies will be used to demonstrate the methods and its application using the SAS ® macros. INTRODUCTION The first step in any statistical analysis includes summarizing the characteristics of the underlying data. All standard statistical packages routinely provide the summary statistics information, and this often includes a sample skewness score, which is a measure of symmetry. Symmetry is a rather complex property of probability distributions and it is difficult to identify deviations from it in a small number of observations. Broadly speaking, a dataset or a distribution is said to be symmetric if it looks same to the right and left of the center point. One of the numerous reasons for checking symmetry in a given data set is because many statistical tests rely strongly on the assumption of normality, which in turn relies on symmetry. Thus, a skewness measure can provide valuable information on issues such as data transformation, outlier detection, distribution fitting etc. so as to ensure that an appropriate analysis procedure (parametric versus non-parametric) is employed. In this paper, we present a novel approach to assessing symmetry, which is simple and computationally less complex. This approach rests on the inherent characteristic of a symmetric distribution: the corresponding upper and lower percentiles are equidistant from its median. 1 We also introduce the other commonly used skewness measures namely the traditional coefficient of skewness index, skewness index based on the L-moments discussed by Hosking 2 , the Quartile and the Octile skewness coefficients proposed by Hinkley 3 , and the symmetry test developed by Randles et al. 4 We discuss the pros and cons of the different approaches, and introduce our comprehensive SAS ® macro that performs these computations efficiently in addition to providing a possible power law transformation to make the variable symmetric where appropriate. 1,5 Two slightly different versions of a macro are available for checking for symmetry. METHODS Let x(1), x(2), ..., x(n) be the ordered random sample of size n from a distribution of the random variable X with mean μ and variance σ 2 . 1) The coefficient of skewness: S1 = ( ) n r i 3 i=1 r 3/2 2 (x -x) m , where m = n m ∑ . For symmetrical distributions, S1 has expectation 0, i.e., when the data is symmetric, the sample skewness coefficient is near zero. If S1 > 0, then the distribution is asymmetric with a positive skew and if S1 < 0, then the distribution is asymmetric with a negative skew. 2) Skewness index based on the L-moments 2 : S2 = 3 2 l l , where: 2 2 l = 2w x , 3 3 2 l = 6w 6w + x , and
منابع مشابه
IRT-FIT: SAS® Macros for Fitting Item Response Theory (IRT) Models
Psychometrics has recently seen the development of complex measurement models to better represent test and item data. Item Response Theory (IRT), in particular, comprises a set of non-linear latent variable models that appear to have several conceptual and empirical properties that make them more valuable in practice than classical test theory methods. However, IRT-based models typically requir...
متن کاملStatistical power analysis for growth curve models using SAS.
Power analysis is critical in research designs. This study discusses a simulation-based approach utilizing the likelihood ratio test to estimate the power of growth curve analysis. The power estimation is implemented through a set of SAS macros. The application of the SAS macros is demonstrated through several examples, including missing data and nonlinear growth trajectory situations. The resu...
متن کاملReading Between the Lines: Distinguishing Macro Code from Open Code in Macros
Have you ever been confused by the sight of bad “SAS® grammar” such as statements that read %IF &X=A %THEN IF X=A...? Have you ever puzzled over how to iteratively generate DATA step DO loops or parts of them? Ever wonder why macros sometimes contain consecutive semicolons? If so, you are not alone. The macro facility enables us to generate open code that varies with circumstances, but of cours...
متن کاملCreating Clinical Trial Summary Tables Containing P-Values: A Practical Approach Using Standard SAS Macros
P-value is a key criterion for evaluating the effectiveness and safety of new drugs in clinical trials, particularly in comparative studies. However, p-values are generally not presented in data summary tables generated with SAS software, because of the complexity of incorporating p-values into a formatted table that contains summary statistics, such as mean, proportion, or standard deviation. ...
متن کاملUpwards: The Role of Analysis in Cost-Optimal SAS+ Planning
In this paper, we will describe the planner UPWARDS, competing in the sixth international planning competition. Our primary focus will be on the novel contributions of the planner: in particular, the application of symmetry breaking, mobile analysis and tunnel macros in sequential cost-optimal SAS+ planning. A brief outline of UPWARDS itself is then
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007